更新版:整理下苏生不惑开发过的那些工具和脚本
苏生不惑第
382
篇原创文章,将本公众号设为星标
,第一时间看最新文章。
苏生不惑这个公众号已经写了380多篇原创文章,分享过很多工具和脚本,其中有些是我自己写的,这里再整理下:
部分软件下载地址我都更新到知识星球了(已经发布几百个帖子了,几乎每天更新),在星球内点击公众号标签查看,微信扫码加入吧。
公众号文章/音频/视频/话题下载
这个是我用python写的,在公众号苏生不惑后台回复 公众号
获取下载地址 因为读者的一个问题,我写了个公众号批量下载工具 , 输入公众号文章地址,批量下载文章里的音频效果:
输入话题地址下载效果如图:
下载的文章链接在文件wechat_topic_list.txt,如果第2次下载会跳过已经下载过的文章,效果:
另外话题里的纯音频也支持下载,这次我用golang重写了golang 开发入门教程,顺便写了个公众号批量下载工具 ,打包生成的exe支持Windows7了,代码如下:
package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"os"
"regexp"
)
func Exists(path string) bool {
_, err := os.Stat(path)
if err != nil {
if os.IsExist(err) {
return true
}
return false
}
return true
}
func InArray(items []string, item string) bool {
for _, eachItem := range items {
if eachItem == item {
return true
}
}
return false
}
func main() {
defer func() {
if err := recover(); err != nil {
fmt.Print("错误信息:")
fmt.Println(err)
}
}()
var url string
fmt.Print("公众号苏生不惑提示你请输入话题地址:")
fmt.Scanln(&url)
if len(url) == 0 {
panic("话题地址为空")
}
client := &http.Client{}
reqest, err := http.NewRequest("GET", url, nil)
reqest.Header.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 ")
if err != nil {
panic(err)
}
response, _ := client.Do(reqest)
defer response.Body.Close()
bResp, _ := io.ReadAll(response.Body)
content := string(bResp)
var voiceids = regexp.MustCompile(`data-voiceid="(.*)"`).FindAllStringSubmatch(content, -1)
var titles = regexp.MustCompile(`data-title="(.*)" data-voiceid`).FindAllStringSubmatch(content, -1)
fileName := "wechat_topic_audio_list.txt"
fileContent, _ := ioutil.ReadFile(fileName)
var voice_urls = regexp.MustCompile(`\n`).Split(string(fileContent), -1)
var f2 *os.File
for k, v := range voiceids {
if InArray(voice_urls, "https://res.wx.qq.com/voice/getvoice?mediaid="+v[1]) {
fmt.Println("已经下载过音频:" + titles[k][1])
continue
}
res, _ := http.Get("https://res.wx.qq.com/voice/getvoice?mediaid=" + v[1])
f, _ := os.Create(titles[k][1] + ".mp3")
io.Copy(f, res.Body)
if Exists(fileName) {
f2, _ = os.OpenFile(fileName, os.O_APPEND, 0666)
} else {
f2, _ = os.Create(fileName)
}
defer f2.Close()
fmt.Println("正在下载音频:" + titles[k][1])
_, _ = f2.WriteString("https://res.wx.qq.com/voice/getvoice?mediaid=" + v[1] + "\n")
}
fmt.Print("下载完成")
}
以这个音频话题为例:
音频地址保存在文件wechat_topic_audio_list.txt ,如果第2次下载也会跳过已经下载过的音频,效果:
为了防止文章被删,还可以下载所有文章内容/音频/视频,详情看我之前文章一键批量下载微信公众号文章内容/图片/封面/视频/音频,支持导出html和pdf格式,包含阅读数/点赞数/在看数/留言数 ,以及文章数据包含文章日期,文章标题,文章链接,文章简介,文章作者,文章封面图,是否原创,IP归属地,阅读数,在看数,点赞数和留言数等,比如抓取过深圳卫健委的公众号数据听说公众号深圳卫健委被网友投诉尺度大,我抓取了所有文章标题和阅读数分析了下
文章留言内容也可以导出到excel(包含文章日期,文章标题文章链接,留言昵称,留言内容,点赞数,回复和留言时间),比如深圳卫健委在2月份有1万6千多条留言,如果你有需要下载的公众号或抓取数据可以微信sushengbuhuo联系我。
顺便分析下留言区的ip归属地微博/公众号/抖音等各大平台都显示 ip 归属地了,能改吗?:
pdf 转换和合并生成书签
这里以莫言的公众号为例,先下载他的所有公众号文章,详情见我之前的文章 一键批量下载微信公众号文章内容/图片/封面/视频/音频,支持导出html和pdf格式,包含阅读数/点赞数/在看数/留言数 ,看IP归属地莫言在上海:
下载的文章html先批量转换成pdf,我也打包成工具了。
def to_pdf():
import pdfkit
print('导出 PDF...')
htmls = []
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".html"):
print(name)
try:
pdfkit.from_file(name, 'pdf/'+name.replace('.html', '')+'.pdf')
except Exception as e:
print(e)
然后将转换的所有pdf文件合并并生成书签。
from PyPDF2 import PdfFileReader, PdfFileWriter,PdfFileMerger
file_writer = PdfFileWriter()
merger = PdfFileMerger()
num = 0
for root, dirs, files in os.walk('.'):
for name in files:
if name.endswith(".pdf"):
logger.info(name);file_reader = PdfFileReader(f"{name}")
file_writer.addBookmark(html.unescape(name).replace('.pdf',''), num, parent=None)
for page in range(file_reader.getNumPages()):
num += 1
file_writer.addPage(file_reader.getPage(page))
with open(r"公众号苏生不惑历史文章合集.pdf",'wb') as f:
file_writer.write(f)
def bookmark_export(lines):
bookmark = ''
for line in lines:
if isinstance(line, dict):
bookmark += line['/Title'] + ','+str(line['/Page']+1)+'\n'
else:
bookmark_export(line)
return bookmark
with open('公众号苏生不惑历史文章合集.pdf', 'rb') as f:
lines = PdfFileReader(f).getOutlines();
bookmark = bookmark_export(lines)
with open('公众号苏生不惑历史文章合集.csv', 'a+', encoding='utf-8-sig') as f:
f.write(bookmark)
知乎下载
输入知乎专栏id即可批量下载知乎专栏所有文章为pdf 周末又写了个知乎专栏批量下载工具,顺便通知个事 ,比如这个专栏 https://www.zhihu.com/column/c_1492085411900530689 ,导出效果:
df = pd.DataFrame(pandas_data, columns=['name', 'counts'])
df.sort_values(by=['counts'], ascending=False, inplace=True)
books = df['name'].head(10).tolist()
counts = df['counts'].head(10).tolist()
print(', '.join(books))
bar = (
Bar()
.add_xaxis(books)
.add_yaxis("", counts)
)
pie = (
Pie()
.add("", [list(z) for z in zip(books, counts)],radius=["40%", "75%"], )
.set_global_opts(title_opts=opts.TitleOpts(title="饼图",pos_left="center",pos_top="20"))
.set_global_opts(legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"))
.set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {d}%"), )
)
pie.render(str(question_id) +'.html')
df.to_csv(str(question_id) +".csv",encoding="utf_8_sig",index=False)
效果:
回答内容也批量下载到excel,包括每个回答人的昵称和回答内容:
微博内容/图片/视频/评论下载
这个也是python写的, 在公众号后台对话框回复 微博
获取软件 ,输入微博uid 一键批量下微博内容/图片/视频,获取博主最受欢迎微博,图片查找微博博主 ,是否下载图片和视频,1为是,0为否,如果想全部下载时间就输入2010-01-01。
cookie需要登陆网页版微博 https://m.weibo.cn/ 在控制台获取。
下载的微博评论点赞转发数据和图片视频保存在weibo目录,下载的所有微博图片:
df = pandas.read_csv(f'{uid}.csv',encoding='utf_8_sig')
df = df[df['头条文章链接'].notnull()]
urls=df.头条文章链接.tolist()
# urls=[urls[0]]
for url in urls:
try:
res=requests.get(url,headers=headers, verify=False)
title = re.search(r'<title>(.*?)</title>',res.text).group(1)
weibo_time = re.search(r'<span class="time".*?>(.*?)</span>',res.text).group(1)
if not weibo_time.startswith('20'):
weibo_time=time.strftime('%Y')+'-'+weibo_time.strip().split(' ')[0]
with open('articles/'+weibo_time+'_'+trimName(title)+'.html', 'w+', encoding='utf-8') as f:
f.write(res.text.replace('"//','https://'))
print('下载微博文章',url)
except Exception as e:
print('错误信息',e,url)
下载效果如图:
抓取数据
不用写代码也可以用chrome扩展web scraper抓取数据 不用写代码,Chrome 扩展神器 web scraper 抓取知乎热榜/话题/回答/专栏,豆瓣电影 ,不会 Python 没关系,手把手教你用 web scraper 抓取豆瓣电影 top 250 和 b 站排行榜 ,我导出了抓取的sitemap,你可以直接导入代码使用,比如微博转发数据的抓取:
{"_id":"weibo","startUrl":["https://weibo.com/1767819164/Lr7nQkAHl?type=repost"],"selectors":[{"id":"content","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.list_li","multiple":true,"delay":2000,"clickElementSelector":"a.page[action-data]","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"微博昵称","type":"SelectorText","parentSelectors":["content"],"selector":".WB_text a[usercard]","multiple":false,"regex":"","delay":0},{"id":"微博评论","type":"SelectorText","parentSelectors":["content"],"selector":".WB_text span","multiple":false,"regex":"","delay":0},{"id":"评论时间","type":"SelectorText","parentSelectors":["content"],"selector":".WB_from a","multiple":false,"regex":"","delay":0}]}
豆瓣电影top250数据抓取:
{"_id":"douban","startUrl":["https://movie.douban.com/top250?start=[0-250:25]&filter="],"selectors":[{"id":"row","type":"SelectorElement","parentSelectors":["_root"],"selector":".grid_view li","multiple":true,"delay":0},{"id":"电影名","type":"SelectorText","parentSelectors":["row"],"selector":"span.title","multiple":false,"regex":"","delay":0},{"id":"豆瓣链接","type":"SelectorLink","parentSelectors":["row"],"selector":".hd a","multiple":false,"delay":0},{"id":"电影排名","type":"SelectorText","parentSelectors":["row"],"selector":"em","multiple":false,"regex":"","delay":0},{"id":"电影简介","type":"SelectorText","parentSelectors":["row"],"selector":"span.inq","multiple":false,"regex":"","delay":0},{"id":"豆瓣评分","type":"SelectorText","parentSelectors":["row"],"selector":"span.rating_num","multiple":false,"regex":"","delay":0}]}
以及抓取b站排行榜的视频排名,标题,播放量,弹幕数,up主,点赞数,投币数,收藏数等 https://www.bilibili.com/v/popular/rank/all 分享几个让 b 站开挂的油猴脚本和chrome扩展
{"_id":"bilibili","startUrl":["https://www.bilibili.com/v/popular/rank/all"],"selectors":[{"delay":0,"id":"row","multiple":true,"parentSelectors":["_root"],"selector":"li.rank-item","type":"SelectorElement"},{"delay":0,"id":"视频排名","multiple":false,"parentSelectors":["row"],"regex":"","selector":"i.num","type":"SelectorText"},{"delay":0,"id":"视频标题","multiple":false,"parentSelectors":["row"],"regex":"","selector":"a.title","type":"SelectorText"},{"delay":0,"id":"播放量","multiple":false,"parentSelectors":["row"],"regex":"","selector":".detail-state > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"弹幕数","multiple":false,"parentSelectors":["row"],"regex":"","selector":"span:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"up主","multiple":false,"parentSelectors":["row"],"regex":"","selector":"a span","type":"SelectorText"},{"delay":0,"id":"视频链接","multiple":false,"parentSelectors":["row"],"selector":"a.title","type":"SelectorLink"},{"delay":0,"id":"点赞数","multiple":false,"parentSelectors":["视频链接"],"regex":"","selector":"span.like","type":"SelectorText"},{"delay":0,"id":"投币数","multiple":false,"parentSelectors":["视频链接"],"regex":"","selector":"span.coin","type":"SelectorText"},{"delay":0,"id":"收藏数","multiple":false,"parentSelectors":["视频链接"],"regex":"","selector":"span.collect","type":"SelectorText"}]}
导出的excel数据如图:
微信头像下载
由于微信接口调整,下载微信头像已经不能用了微信神器:一键批量下载微信好友头像并拼成一张图
电脑端登陆多个微信脚本
写个简单的bat脚本即可在电脑上登陆多个微信,想开几个微信都行如何在电脑上登陆多个微信
start "D:\wechat\WeChat.exe"
start "D:\wechat\WeChat.exe"
start "D:\wechat\WeChat.exe"
豆丁文档下载
打开软件2022 一键下载百度网盘/百度文库/豆丁文档/道客巴巴文档/原创力文档,顺便分享个百度网盘活动 ,输入豆丁文档地址即可下载。
如果下载失败进入软件目录执行命令行,看输出错误提示没找到chromedriver,从
https://registry.npmmirror.com/binary.html?path=chromedriver/ 下载就好了。
今日头条下载
国庆节的时候研究了下今日头条,可以批量下载头条号的文章和微头条,以这个号为例,下载了所有文章html,可以用我上面写的工具批量转pdf再生成一个pdf合集:
如果文章对你有帮助还请
点赞/在看/分享
三连支持下, 感谢各位!